Structured Generative Models of Natural Source Code

نویسندگان

  • Chris J. Maddison
  • Daniel Tarlow
چکیده

We study the problem of building generative models of natural source code (NSC); that is, source code written and understood by humans. Our primary contribution is to describe a family of generative models for NSC that have three key properties: First, they incorporate both sequential and hierarchical structure. Second, we learn a distributed representation of source code elements. Finally, they integrate closely with a compiler, which allows leveraging compiler logic and abstractions when building structure into the model. We also develop an extension that includes more complex structure, refining how the model generates identifier tokens based on what variables are currently in scope. Our models can be learned efficiently, and we show empirically that including appropriate structure greatly improves the models, measured by the probability of generating test programs.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Tree-structured Variational Autoencoder

Many kinds of variable-sized data we would like to model contain an internal hierarchical structure in the form of a tree, including source code, formal logical statements, and natural language sentences with parse trees. For such data it is natural to consider a model with matching computational structure. In this work, we introduce a variational autoencoder-based generative model for tree-str...

متن کامل

Discriminative Models for Semi-Supervised Natural Language Learning

An interesting question surrounding semisupervised learning for NLP is: should we use discriminative models or generative models? Despite the fact that generative models have been frequently employed in a semi-supervised setting since the early days of the statistical revolution in NLP, we advocate the use of discriminative models. The ability of discriminative models to handle complex, high-di...

متن کامل

Syntactic Topic Models for Language Generation

Since topic models’ inception as probabilistic generative models, it has only been natural to imagine actually applying the generative process to create documents. However, most topic models consist of a generative process that only provides a bag of words which is one critical step short of creating a readable text. With the recent introduction of syntactically sound topic models and structure...

متن کامل

Structured classification for multilingual natural language processing

This thesis investigates the application of structured sequence classification models to multilingual natural language processing (NLP). Many tasks tackled by NLP can be framed as classification, where we seek to assign a label to a particular piece of text, be it a word, sentence or document. Yet often the labels which we’d like to assign exhibit complex internal structure, such as labelling a...

متن کامل

Natural-Gradient Stochastic Variational Inference for Non-Conjugate Structured Variational Autoencoder

We propose a new variational inference method which uses recognition models for amortized inference in graphical models that contain deep generative models. Unlike many existing approaches, our method can handle non-conjugacy in both the latent graphical model and the deep generative model, and enables fully amortized inference at test time. Our method is based on an extension of a recently pro...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014